Picture for Sahar Abdelnabi

Sahar Abdelnabi

Models That Know How Evaluations Are Designed Score Safer

Add code
May 27, 2026
Viaarxiv icon

Measuring Security Without Fooling Ourselves: Why Benchmarking Agents Is Hard

Add code
May 21, 2026
Viaarxiv icon

Decomposing and Measuring Evaluation Awareness

Add code
May 21, 2026
Viaarxiv icon

No More, No Less: Task Alignment in Terminal Agents

Add code
May 12, 2026
Viaarxiv icon

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

Add code
Feb 25, 2026
Viaarxiv icon

Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

Add code
Feb 16, 2026
Viaarxiv icon

Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs

Add code
Feb 09, 2026
Viaarxiv icon

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

Add code
Nov 07, 2025
Figure 1 for ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Figure 2 for ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Figure 3 for ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Figure 4 for ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Viaarxiv icon

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

Add code
Oct 30, 2025
Viaarxiv icon

Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies

Add code
Oct 16, 2025
Viaarxiv icon